Merge onboarding and deployment updates#573
Conversation
- read focus/agenda files via storage keys in agent context - write A2A focus updates through storage backend instead of local paths - add S3 local fallback read-through migration wrapper - document S3/fallback environment settings - add boto3 dependency and regression coverage
…cking - Replace sync boto3 writes with aioboto3 async client to eliminate stale connection stalls (30s → <2s on concurrent skill uploads) - Add tcp_keepalive, connect_timeout, and read_timeout to boto3 config - Add aioboto3>=13.0.0 dependency - Simplify skill upload in create_agent: single gather, no semaphore, no nested helper function - Add workspace locking service for HA-safe concurrent agent operations - Extend storage runtime base with versioned write (write_bytes_if_match), get_version, and StorageVersion/WriteCondition/ConditionalWriteResult types - Propagate storage backend through files API, agent tools, and seeder - Add fallback storage backend for local→S3 migration path - Add nginx multi-instance config for HA deployment - Add tests: storage S3, files API storage, agent tools workspace, org sync
…service Conflict resolution decisions: - files.py: keep storage backend, discard main's local filesystem calls (focus checks already present via HEAD's is_focus_file_path) - agent_context.py: use main's focus_service.render_focus_context (cleaner than file-based focus reading) - agent_tools.py: keep HEAD's _execute_workspace_mutation abstraction, merge main's focus tools (list/upsert/complete_focus_item), use main's _append_focus_item via ensure_focus_item service - relationships.py: merge both imports (storage + permissions + Identity) - main.py: keep HEAD's _role_enabled guard, add main's app.models.focus - skill_seeder.py: keep HEAD's storage backend approach - trigger_daemon.py: keep HEAD's runtime extraction, preserve main's _should_skip_non_workday and _handle_okr_report_trigger - heartbeat.py: keep HEAD's storage backend imports - MarkdownRenderer.tsx: keep HEAD's CLAWITHMDTOKEN naming
Lost during merge conflict resolution. Re-adds main's filtering logic to the storage-backend list_files implementation: - Skip .gitkeep files - Hide focus.md / agenda.md at workspace root (managed via Focus API) - Filter enterprise_info at root (already injected as virtual entry)
Merge conflict resolution dropped several defensive checks from main: files.py: - Add is_focus_file_path check to read_file endpoint (410 GONE) - Add .gitkeep filtering to list_enterprise_kb_files agent_tools.py (_execute_workspace_mutation): - Add is_focus_file_path guard to write_file, move_file, delete_file, edit_file (returns user-friendly error directing to Focus API) Enterprise endpoints' path safety is already guaranteed by normalize_storage_key which strips ../ traversal, so the explicit startswith checks from main are not required here.
- Add deploy directory with docker-compose.yml and nginx configs - Add deploy-all-in-one.sh deployment script - Update network to clawith_yaojin_network to avoid conflicts - Resolve agent_tools.py merge conflict with generate_image_custom support
There was a problem hiding this comment.
💡 Codex Review
This indented statement sits at module scope, which makes trigger_daemon.py fail to import with IndentationError. Because backend/app/main.py imports start_trigger_daemon during lifespan startup, this syntax error prevents the backend from starting in all roles, not just worker mode.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pipe.expire(self._connection_key(connection_id), PRESENCE_TTL_SECONDS) | ||
| pipe.expire(self._agent_index_key(agent_id), PRESENCE_TTL_SECONDS) |
There was a problem hiding this comment.
Keep websocket presence keys alive while connected
Presence keys are given a 180-second TTL only at registration time, but there is no refresh path in this router. Any WebSocket that stays open longer than 3 minutes will age out of Redis even though it is still connected, causing cross-instance message routing and active-session/viewer checks to silently stop working for long-lived sessions.
Useful? React with 👍 / 👎.
Backend: - Add identity_provider_lookup service for unified provider resolution - Enhance Google Workspace OAuth integration - Improve SSO service and auth provider registry - Resolve merge conflict in relationships.py Frontend: - Add Atlas design system (18 components: plates, buttons, inputs, etc.) - Add CosmicBackground and StarField visual effects - Update Login, Onboarding, CompanySetup pages with Atlas theme - Update agent detail and enterprise settings tabs
There was a problem hiding this comment.
💡 Codex Review
This file contains a top-level indented block starting at line 81 without any enclosing statement, which raises SyntaxError: unexpected indent as soon as app.services.trigger_daemon is imported. Because backend/app/main.py imports start_trigger_daemon during application lifespan startup, this parse error prevents the backend from starting in any environment where startup runs.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Use ThreadPoolExecutor for password hashing/verification, update agents API, chat sessions, and agent detail page.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4056d65753
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if normalized_path not in {"workspace", "skills"} and not normalized_path.startswith(("workspace/", "skills/")): | ||
| raise HTTPException(status_code=400, detail="右侧根目录视图是 agent 根目录;上传文件时请放到 workspace/ 或 skills/ 目录下") |
There was a problem hiding this comment.
Normalize upload path before prefix validation
The upload endpoint validates path with startswith("workspace/")/"skills/" before it is normalized, but _agent_storage_key() later removes .. segments; this allows inputs like workspace/../../soul.md to pass validation and be written outside the intended upload roots. In upload_file_to_workspace, a user with normal agent access can therefore overwrite arbitrary files under the agent root instead of being confined to workspace/ or skills/ as the API contract implies.
Useful? React with 👍 / 👎.
| if entry.is_dir: | ||
| keys.extend(await _collect_storage_tree_versions(storage, entry.key)) | ||
| else: | ||
| version = await storage.get_version(entry.key) | ||
| keys.append((entry.key, version.token)) |
There was a problem hiding this comment.
Preserve empty directories when moving folder trees
Directory moves now copy only file entries collected by _collect_storage_tree_versions, which recursively skips directory nodes and returns no record for empty folders. When moving a directory that contains empty subdirectories, those empty subdirectories are dropped (the source tree is removed later), which is a regression from the previous shutil.move behavior and causes silent data-structure loss in workspace trees.
Useful? React with 👍 / 👎.
| tmp = NamedTemporaryFile(delete=False, suffix=suffix) | ||
| tmp.close() |
There was a problem hiding this comment.
Avoid leaking temp files in S3 local materialization
S3StorageBackend.local_path_for() creates a NamedTemporaryFile(delete=False) and returns its path without any lifecycle cleanup, so every preview/extraction/download path that materializes S3 objects leaves orphan files on disk. Under repeated document previews/uploads this accumulates indefinitely and can eventually exhaust container disk space.
Useful? React with 👍 / 👎.
…code - Hoist resolve_email_config_async to entry points to avoid repeated DB calls - Replace COUNT queries with EXISTS pattern for first-user checks - Use exact match (==) instead of ilike for tenant domain lookup to hit index - Add performance indexes for org_members and agents tables - Remove ~320 lines of legacy inline trigger logic already replaced by invoke_agent_for_triggers_runtime
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a3c20930c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if not await storage.exists(storage_key) or not await storage.is_dir(storage_key): | ||
| return [] |
There was a problem hiding this comment.
Handle virtual directories when listing enterprise KB
Treating directory existence as exists && is_dir breaks enterprise knowledge-base listing on S3-compatible storage, because uploaded files create key prefixes but usually not a real object at the directory key. In that case exists() is false while is_dir() is true, so this branch returns [] even when files are present, making the KB appear empty in multi-instance/object-storage deployments.
Useful? React with 👍 / 👎.
| if await storage.exists(skills_prefix) and await storage.is_dir(skills_prefix): | ||
| for entry in await storage.list_dir(skills_prefix): |
There was a problem hiding this comment.
Load skills from directory prefixes in object storage
The skills index now requires storage.exists(skills_prefix) before listing, but S3-style backends generally do not materialize directory marker objects for .../skills. After uploading skill files, is_dir() can be true while exists() is false, so the loader skips list_dir and returns no skills, removing skill context from prompts whenever object storage is enabled.
Useful? React with 👍 / 👎.
| source_before = await storage.read_text(source_key, encoding="utf-8", errors="replace") if source_exists else None | ||
| destination_before = await storage.read_text(destination_key, encoding="utf-8", errors="replace") if destination_exists else None |
There was a problem hiding this comment.
Avoid reading directory paths as text during moves
This code reads source_before/destination_before based only on exists, so moving folders (or overwriting an existing destination folder) calls storage.read_text(...) on directory keys. Local storage backends will raise IsADirectoryError, causing move_workspace_path to fail with an exception instead of completing the move.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b9f059ddc8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| result = await db.execute( | ||
| select(Tenant).where( | ||
| Tenant.sso_domain.ilike(f"%{domain}%"), | ||
| Tenant.sso_domain == domain, |
There was a problem hiding this comment.
Match tenant domains against stored SSO URL format
detect_tenant_by_email now compares Tenant.sso_domain to the bare email domain (e.g. acme.com), but tenant SSO domains are stored as full URLs (see resolve_tenant_by_domain docs/logic in backend/app/api/tenants.py), so this equality check will miss valid tenants and break automatic tenant association for normal and SSO registration flows unless the DB happens to store raw domains.
Useful? React with 👍 / 👎.
| response = await asyncio.to_thread( | ||
| client.list_objects_v2, | ||
| Bucket=self.bucket, | ||
| Prefix=prefix, | ||
| Delimiter="/", | ||
| ) |
There was a problem hiding this comment.
Paginate S3 directory listings beyond first 1000 keys
list_dir calls list_objects_v2 only once and never follows IsTruncated/NextContinuationToken, so prefixes with more than one page of children will silently return an incomplete directory listing (AWS caps each page), which causes missing files/folders in workspace and enterprise KB browsing for larger trees.
Useful? React with 👍 / 👎.
| response = await asyncio.to_thread( | ||
| client.list_objects_v2, | ||
| Bucket=self.bucket, | ||
| Prefix=prefix, | ||
| ) |
There was a problem hiding this comment.
Paginate S3 tree deletion so folder deletes are complete
delete_tree also reads only the first list_objects_v2 page and sends a single delete_objects batch, so deleting/moving/replacing directories with more than one page of objects leaves trailing files behind, leading to partial deletes and inconsistent workspace state on larger S3-backed folders.
Useful? React with 👍 / 👎.
- Resolve merge conflict in agent_tools.py (execute_code with temp workspace + on_output) - Add alembic merge migration to consolidate multiple heads
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9001214b42
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if trigger_type == "on_message": | ||
| matched_from = str(cfg.get("_matched_from") or "") | ||
| matched_message = str(cfg.get("_matched_message") or "") | ||
| digest = hashlib.sha256(f"{matched_from}\n{matched_message}".encode("utf-8")).hexdigest() | ||
| return f"on_message:{trigger.id}:{digest}" |
There was a problem hiding this comment.
Include event identity in on_message execution keys
on_message idempotency is derived only from matched_from + message text, but trigger_executions enforces a unique (trigger_id, idempotency_key) across all history. That means two different incoming messages with identical content (for example repeated "ok" replies) collapse to the same key, so later events are silently dropped and never invoke the agent even though they are new messages.
Useful? React with 👍 / 👎.
| if trigger_type == "poll": | ||
| current_value = str(cfg.get("_last_value") or "") | ||
| digest = hashlib.sha256(current_value.encode("utf-8")).hexdigest() | ||
| return f"poll:{trigger.id}:{digest}" |
There was a problem hiding this comment.
Add run-specific entropy to poll execution keys
The poll idempotency key is based only on _last_value, which permanently deduplicates any future execution that sees a previously observed value because of the unique (trigger_id, idempotency_key) constraint. This breaks expected poll behavior for recurring fire_on=match conditions and for change sequences like A→B→A, where the second A should fire again but is suppressed.
Useful? React with 👍 / 👎.
| if not model or not model.enabled: | ||
| logger.warning(f"Agent {agent.name}'s model is unavailable, skipping trigger invocation") | ||
| return |
There was a problem hiding this comment.
Mark claimed executions terminal before early return
This return path exits without calling mark_trigger_executions_completed or mark_trigger_executions_failed, leaving claimed rows in processing. Since the claimant later reclaims expired processing leases, triggers for missing/disabled agents/models will be retried forever (log churn and queue noise) instead of reaching a terminal status.
Useful? React with 👍 / 👎.
Keep async password functions (hash_password_async, verify_password_async) which are used throughout the file.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c458cc9279
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if destination_is_dir: | ||
| await storage.delete_tree(destination_key) |
There was a problem hiding this comment.
Validate source before deleting overwrite destination
When overwrite=true, this branch deletes the existing destination immediately, but source conflict checks happen later (e.g. the expected_source_version_token mismatch path). If the move then returns a conflict, the call fails after already deleting destination data, causing irreversible data loss for what should be a rejected move. Reorder the operation so source preconditions are verified before any destructive delete on destination.
Useful? React with 👍 / 👎.
| agent_dir.mkdir(parents=True, exist_ok=True) | ||
| if not await storage.exists(agent_prefix) and not await storage.is_dir(agent_prefix): | ||
| return agent_dir | ||
| for entry in await storage.list_dir(agent_prefix): | ||
| await self._materialize_entry(storage, entry.key, agent_dir) |
There was a problem hiding this comment.
Clear stale local workspace before materializing from storage
This materialization path only writes files that exist in shared storage and never removes pre-existing local files under agent_dir. In S3/fallback deployments, deleting a file from shared storage and then restarting the container leaves the old local copy mounted into OpenClaw, so the agent still sees stale/deleted content. The local workspace needs a cleanup/sync step before repopulating from storage.
Useful? React with 👍 / 👎.
Summary
Testing